Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Switch to go 1.17" #618

Merged
merged 3 commits into from Nov 29, 2021
Merged

Revert "Switch to go 1.17" #618

merged 3 commits into from Nov 29, 2021

Conversation

vincentbernat
Copy link
Member

This reverts commit fe5e5b4. We
switch back to 1.16.5 just for testing. It was known to work with
1.35.0. If it works, we need to try 1.16.10 then.

This reverts commit fe5e5b4. We
switch back to 1.16.5 just for testing. It was known to work with
1.35.0. If it works, we need to try 1.16.10 then.
@vincentbernat vincentbernat marked this pull request as draft November 25, 2021 20:49
@github-actions
Copy link

📷 Snapshot created

@andrew-susanto
Copy link

andrew-susanto commented Nov 27, 2021

Running pr-618/SNAPSHOT-9483d8b version on EdgeRouter for about 33 hours and having no issue.

@dbeusink
Copy link

Same here. Runs here for over 30 hours without interruption.

@deman3417
Copy link

With 9483 edgerouter working smoothly.

This is the latest release in 1.16.
@rs
Copy link
Contributor

rs commented Nov 27, 2021

Please remove winio imported pipe test so we get it green.

@github-actions
Copy link

📷 Snapshot created

@vincentbernat
Copy link
Member Author

vincentbernat commented Nov 27, 2021

If some of you can test the same version with Go 1.16.10 and report if it still works, it would be helpful. While we can surely stay with Go 1.16.10 if the issue is with Go 1.17, we cannot keep an outdated compiler. If it works with Go 1.16.10, it gives us some time to understand what broke between Go 1.16 and Go 1.17. If it does not work, we have even fewer changes to understand.

  • To test with Go 1.16.5: NEXTDNS_VERSION=pr-618/SNAPSHOT-9483d8b sh -c 'sh -c "$(curl -sL https://nextdns.io/install)"'
  • To test with Go 1.16.10: NEXTDNS_VERSION=pr-618/SNAPSHOT-4b5a536 sh -c 'sh -c "$(curl -sL https://nextdns.io/install)"'

@vincentbernat
Copy link
Member Author

Please remove winio imported pipe test so we get it green.

Not enough to make it green. Maybe I can just disable the static check for Windows? Getting green is not that important or do you plan to merge if OK?

@github-actions
Copy link

📷 Snapshot created

@rs
Copy link
Contributor

rs commented Nov 27, 2021

Ok to disable windows and yes, will merge regardless.

@Tuinslak
Copy link

I may be missing something here but:

yeri@sg-erl:~$ NEXTDNS_VERSION=pr-618/SNAPSHOT-4b5a536 sh -c 'sh -c "$(curl -sL https://nextdns.io/install)"'
INFO: OS: edgeos
INFO: GOARCH: mips64_softfloat
INFO: GOOS: linux
INFO: NEXTDNS_BIN: /config/nextdns/nextdns
INFO: INSTALL_RELEASE: pr-618/SNAPSHOT-4b5a536
u) Upgrade NextDNS from 0.0.0-SNAPSHOT-9483d8b to pr-618/SNAPSHOT-4b5a536
c) Configure NextDNS
r) Remove NextDNS
e) Exit
Choice (default=u): u
INFO: Upgrading NextDNS...

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
u) Upgrade NextDNS from 0.0.0-SNAPSHOT-9483d8b to pr-618/SNAPSHOT-4b5a536
c) Configure NextDNS
r) Remove NextDNS
e) Exit

@github-actions
Copy link

📷 Snapshot created

@andrew-susanto
Copy link

andrew-susanto commented Nov 27, 2021

Looks like the new commit on this PR makes the old snapshot unable to be accessed. When you try to access it shows Access Denied like https://snapshot.nextdns.io/pr-618/nextdns-SNAPSHOT-9483d8b_linux_mips64le_softfloat.tar.gz.

The newest commit in this PR using Go 1.16.10 :
NEXTDNS_VERSION=pr-618/SNAPSHOT-f535f8c sh -c 'sh -c "$(curl -sL https://nextdns.io/install)"'

@Tuinslak
Copy link

Okay -- running that version now for ~24hr

@deman3417
Copy link

Nov 27 17:09:49 DNS-server nextdns[1754]: NextDNS 0.0.0-SNAPSHOT-9483d8b/linux stopped
Nov 27 17:09:50 DNS-server nextdns[6760]: Starting NextDNS 0.0.0-SNAPSHOT-f535f8c/linux on :53
Nov 27 17:09:50 DNS-server nextdns[6760]: Starting mDNS discovery
Nov 27 17:09:50 DNS-server nextdns[6760]: Listening on TCP/:53
Nov 27 17:09:50 DNS-server nextdns[6760]: Listening on UDP/:53
Nov 27 17:09:55 DNS-server nextdns[6760]: Connected [2a07:a8c0::]:443 (con=11ms tls=3181ms, TCP, TLS13)
Nov 27 17:09:55 DNS-server nextdns[6760]: Setting up router
Nov 27 17:09:58 DNS-server nextdns[6760]: Connected [2a0d:5642:115:197:5054:ff:fef3:ccd4]:443 (con=11ms tls=2839ms, TCP, TLS13)
Nov 27 17:09:58 DNS-server nextdns[6760]: Switching endpoint: https://dns.nextdns.io#185.18.148.91,2a04:b80:1:30::2,194.110.115.97,2a0d:5642:115:197:5054:ff:fef3:ccd4
Nov 27 17:09:58 DNS-server nextdns[6760]: Query 192.168.2.200 UDP AAAA cloudsync-tw.synology.com. (qry=43/res=12) 6423ms : doh resolve: context deadline exceeded
Nov 27 17:09:58 DNS-server nextdns[6760]: Query 192.168.2.200 UDP A cloudsync-tw.synology.com. (qry=43/res=12) 6425ms : doh resolve: context deadline exceeded

@vincentbernat
Copy link
Member Author

OK, only recently accessed tarballs are still in the cache. If you happen to want to rollback to the previous version, you can fetch it from here: https://github.com/nextdns/nextdns/actions/runs/1510763854 (the dist archive). I let another person confirm the problem appears with f535f8c while it did not happen with 9483d8b before reaching any conclusion.

@vincentbernat
Copy link
Member Author

If this snapshot fails for you, feel free to revert to the one in #620. I'll keep #620 unmodified from now on and will try to bisect the issue using #618. Currently, everyone reported that Go 1.16.5 works and @deman3417 reported that Go 1.16.10 fails. I am waiting for a second report to try Go 1.16.8. A lot of changes where introduced in 1.16.10, but picking 1.16.8 will reduce the number of tries is the problem is in a previous version.

@andrew-susanto
Copy link

andrew-susanto commented Nov 27, 2021

So far f535f8c makes no problem on my side. Several context deadline exceeded is expected after switching endpoint or starting nextdns service. It also hit me on 9483d8b. It has very minimal impact on the network since it happens very rarely.

Nov 27 22:53:31 ubnt nextdns[9099]: Starting NextDNS 0.0.0-SNAPSHOT-f535f8c/linux on :53
Nov 27 22:53:31 ubnt nextdns[9099]: Listening on TCP/:53
Nov 27 22:53:31 ubnt nextdns[9099]: Starting mDNS discovery
Nov 27 22:53:31 ubnt nextdns[9099]: Listening on UDP/:53
Nov 27 22:53:36 ubnt nextdns[9099]: Setting up router
Nov 27 22:53:41 ubnt nextdns[9099]: Connected 45.90.28.0:443 (con=21ms tls=3855ms, TCP, TLS13)
Nov 27 22:53:45 ubnt nextdns[9099]: Connected 45.114.118.156:443 (con=23ms tls=3269ms, TCP, TLS13)
Nov 27 22:53:45 ubnt nextdns[9099]: Switching endpoint: https://dns.nextdns.io#34.101.84.129,45.114.118.156
Nov 27 22:56:08 ubnt nextdns[9099]: Connected 45.114.118.156:443 (con=11ms tls=37ms, TCP, TLS13)
Nov 27 22:57:37 ubnt nextdns[9099]: Received signal: hangup (ignored)
Nov 27 22:58:02 ubnt nextdns[9099]: Connected 45.114.118.156:443 (con=0ms tls=0ms, TCP, )
Nov 27 23:03:16 ubnt nextdns[9099]: Connected 45.114.118.156:443 (con=12ms tls=41ms, TCP, TLS13)
Nov 27 23:08:02 ubnt nextdns[9099]: Connected 45.114.118.156:443 (con=23ms tls=65ms, TCP, TLS13)

@deman3417
Copy link

I ran another log.
It did not occur again after those 2 times.
In the meantime I have become a bit paranoid

@dbeusink
Copy link

Also upgraded from commit 9483d8b to f535f8c. So far no problems, I will keep monitoring it.

@dbeusink
Copy link

Running f535f8c without interruption for more than 18 hours now.

@Tuinslak
Copy link

Tuinslak commented Nov 28, 2021

f535f8c seems fine here as well (~18hrs)

nextdns version 0.0.0-SNAPSHOT-f535f8c

@marcineq
Copy link

9483d8b is fine for me as well. No errors for the last 48h. Now giving a try with f535f8c .

@dbeusink
Copy link

Running f535f8c without interruption for more than 40 hours now.

@vincentbernat
Copy link
Member Author

This one seems good enough until we figure the root cause. Let's merge it.

@vincentbernat vincentbernat marked this pull request as ready for review November 29, 2021 10:31
@rs rs merged commit cd8823e into nextdns:master Nov 29, 2021
@marcineq
Copy link

f535f8c is fine as well - not a single occurence of error for the last 2 days.

@Tuinslak
Copy link

Tuinslak commented Dec 3, 2021

This morning it seems to have crashed.

Back at ~100+% cpu

Screen Shot 2021-12-03 at 09 02 56

Screen Shot 2021-12-03 at 09 03 03

@dbeusink
Copy link

dbeusink commented Dec 3, 2021

Here is an update from my side. It's running fine for 6 days now on my USG-3P v4.4.56. The high CPU as shown by @Tuinslak never happened on my device, also not with the old version. The only problems I had were the "context deadline exceeded" messages resulting in a temporal DNS resolving outage on my network.

@Tuinslak
Copy link

Tuinslak commented Dec 3, 2021

I do get those messages too, but I also get these on Linux servers (= non-routers) and doesn't seem to be breaking anything.

yeri@sg-erl:~$ nextdns log
Dec  3 15:32:44 sg-erl nextdns[9170]: Query 10.60.111.200 UDP AAAA Sauron. (qry=24/res=12) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
Dec  3 15:32:45 sg-erl nextdns[9170]: Query 10.60.111.200 UDP AAAA Sauron. (qry=24/res=12) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
Dec  3 15:32:51 sg-erl nextdns[9170]: Query 10.60.111.200 UDP AAAA Sauron. (qry=24/res=12) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
Dec  3 15:50:44 sg-erl nextdns[9170]: Connected 103.62.48.147:443 (con=4ms tls=26ms, TCP, TLS13)
Dec  3 15:51:48 sg-erl nextdns[9170]: Connected 217.146.9.93:443 (con=3ms tls=53ms, TCP, TLS13)
yeri@sg-erl:~$ date
Fri Dec  3 16:06:32 +08 2021

for what it's worth, the version of my edge router lite:

yeri@sg-erl:~$ show version
Version:      v2.0.9-hotfix.2
Build ID:     5402463
Build on:     05/11/21 13:17
Copyright:    2012-2020 Ubiquiti Networks, Inc.
HW model:     EdgeRouter Lite 3-Port
HW S/N:       788A207F4291
Uptime:       16:13:34 up 13 days,  3:26,  1 user,  load average: 0.80, 0.66, 0.53
yeri@sg-erl:~$ uname -a
Linux sg-erl 4.9.79-UBNT #1 SMP Tue May 11 13:20:59 UTC 2021 mips64 GNU/Linux
yeri@sg-erl:~$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

@marcineq
Copy link

marcineq commented Dec 3, 2021

so far, so good with f535f8c - over 5 days and not a single issue on ER-X.

@bohdan-s
Copy link

I can update this is still an issue with v1.37.11 (Go 1.18). Upgrading causes 100% CPU usage spike within 1 hour.
Reverting back to 1.16.10 resolves the issue again f535f8c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants